-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-37662: Binlog Corruption When tmpdir is Full #4373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9213ed1 to
72d0eef
Compare
andrelkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work looks good. Special thanks for a solid analysis!
I have only a kind of cosmetic notes.
sql/log.cc
Outdated
| Returns TRUE on success, FALSE on error. | ||
| */ | ||
| static my_bool binlog_cache_reconcile_data_in_storage(IO_CACHE *info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new function purpose is clear, but its introduction as static feels an overkill..
sql/log.cc
Outdated
| ready-to-commit (concurrent) transactions could be stalled | ||
| */ | ||
| if (using_stmt && !thd->binlog_flush_pending_rows_event(TRUE, FALSE) && | ||
| binlog_cache_reconcile_data_in_storage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. so instead my_b_get_pos_in_file(info) && flush_io_cache(info) would be a fair enough at this point, as a new inline, why not flush_pending_bytes(info); flushin the name, while reads naturally, is also for consistency with the caller and a sibling's names.
Also the suggested name suggests to me (much) shorter comments that your static function's header ones..
Indeed, the whole point of the caller function is to get all stuff to disk. That, as it turns out in this bug, just deals on few levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the inline change, but the change to the function name I'm still not sold on (though I'm not sold on it as it is either). It isn't guaranteed to flush if the content should all be held in-memory. I'll think on a better function name for a couple days before pushing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flush_write_buffer_if_needed() 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's closer, and to finish it off, I think flush_write_buffer_if_file_backed() would make it more specific.
b989750 to
7707e5c
Compare
|
Brandon Nesterenko ***@***.***> writes:
+ 1) we write the GTID event separately to the binlog directly before
+ moving the cache data, and if the reconciliation fails (e.g. if the
+ directory storing the tmp file is full), the binlog would get
+ corrupted with a standalone GTID event
+ 2) that happens during group commit with locks held, and other
+ ready-to-commit (concurrent) transactions could be stalled
Yes. The GTID allocation happens with very critical global locks held. The
code should do the minimum necessary while the global mutex is held,
certainly not do an unnecessary system call. If we already have all the data
in memory and are just about to write it to the binlog, there seems little
reason to do extra system calls to write something to a temporary file and
read it back. I suspect it is just an artifact of the way code happened to
be written at the time this was implemented.
This is in any case fixed properly in the new binlog implementation
MDEV-34705. The binlog is recovered consistently with the table data even in
case of crash, and event data never need to be written to any temporary
files, which seems a much better solution.
- Kristian.
|
The binary log could be corrupted when committing a large transaction
(i.e. one whose data exceeds the binlog_cache_size limit and spills
into a tmp file) in binlog_format=row if the server's --tmp-dir is
full. The corruption that happens is only the GTID of the errored
transaction would be written into the binary log, without any
body/finalizing events. This would happen because the content of the
transaction wasn't flushed at the proper time, and the transaction's
binlog cache data was not durable while trying to copy the content
from the binlog cache file into the binary log itself. While switching
the tmp file from a WRITE_CACHE to a READ_CACHE, the server would see
there is still data to flush in the cache, and first try to flush it.
This is not a valid time to flush that data to the temporary file
though, as:
1. The GTID event has already been written directly to the binary
log. So if this flushing fails, it leaves the binary log in a
corrupted state.
2. This is done during group commit, and will slow down other
concurrent transactions, which are otherwise ready to commit.
This patch fixes these issues by ensuring all transaction data is
fully flushed to its temporary file (if used) before starting any
critical paths, i.e. in binlog_flush_cache(). Note that if the binlog
cache is solely in-memory, this flush-to-temporary-file is skipped.
Reviewed-by: Andrei Elkin <[email protected]>
Signed-off-by: Brandon Nesterenko <[email protected]>
7707e5c to
0f5e5ce
Compare
The binary log could be corrupted when committing a large transaction
(i.e. one whose data exceeds the binlog_cache_size limit and spills
into a tmp file) in binlog_format=row if the server's --tmp-dir is
full. The corruption that happens is only the GTID of the errored
transaction would be written into the binary log, without any
body/finalizing events. This would happen because the content of the
transaction wasn't flushed at the proper time, and the transaction's
binlog cache data was not durable while trying to copy the content
from the binlog cache file into the binary log itself. While switching
the tmp file from a WRITE_CACHE to a READ_CACHE, the server would see
there is still data to flush in the cache, and first try to flush it.
This is not a valid time to flush that data to the temporary file
though, as:
The GTID event has already been written directly to the binary
log. So if this flushing fails, it leaves the binary log in a
corrupted state.
This is done during group commit, and will slow down other
concurrent transactions, which are otherwise ready to commit.
This patch fixes these issues by ensuring all transaction data is
fully flushed to its temporary file (if used) before starting any
critical paths, i.e. in binlog_flush_cache(). Note that if the binlog
cache is solely in-memory, this flush-to-temporary-file is skipped.
This PR is organized as follows:
Commit 1: The regression that reproduces the bug report
Commit 2: The fix
And any future commits will be addressing reviews.